Clustering Mixed Data via Diffusion Maps

نویسندگان

  • Gil David
  • Amir Averbuch
چکیده

Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, customer segmentation, trend analysis, pattern recognition and image analysis. Although many clustering algorithms have been proposed most of them deal with clustering of numerical data. Finding the similarity between numeric objects usually relies on a common distance measure such as the Euclidean distance. However, the problem of clustering categorical (nominal) data is more difficult and challenging since categorical data have nominal attributes. As a result, finding the similarities between nominal objects using common distance measures, which are used for processing numeric data, is not applicable here. Moreover, real applications data have to deal with mixed types of attributes such as numeric and nominal data that reside together. In this paper, we propose a technique that solves this problem. We suggest to transform the input data (categorical and numerical) into categorical values. This is achieved by an automatic non-linear transformations, which identify geometric patterns in these datasets, that find the connections among

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diffusion Maps Clustering for Magnetic Resonance Q-Ball Imaging Segmentation

White matter fiber clustering aims to get insight about anatomical structures in order to generate atlases, perform clear visualizations, and compute statistics across subjects, all important and current neuroimaging problems. In this work, we present a diffusion maps clustering method applied to diffusion MRI in order to segment complex white matter fiber bundles. It is well known that diffusi...

متن کامل

Fuzzy Adaptive Resonance Theory, Diffusion Maps and their applications to Clustering and Biclustering

In this paper, we describe an algorithm FARDiff (Fuzzy Adaptive Resonance Diffusion) which combines Diffusion Maps and Fuzzy Adaptive Resonance Theory to do clustering and biclustering on high dimensional data. We describe some applications of this method.

متن کامل

Unsupervised Clustering Using Diffusion Maps for Local Shape Modelling

Understanding the biological variability of anatomical objects is essential for statistical shape analysis and to distinguish between healthy and pathological structures. Statistical Shape Modelling (SSM) can be used to analyse the shapes of sub-structures aiming to describe their variation across individual objects and between groups of them [1]. However, when the shapes exhibit self-similarit...

متن کامل

Multimodal diffusion geometry by joint diagonalization of Laplacians

We construct an extension of diffusion geometry to multiple modalities through joint approximate diagonalization of Laplacian matrices. This naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of manifold learning, retrieval, and clustering demonstrating that the joint diffusio...

متن کامل

Manifold Learning and Dimensionality Reduction with Diffusion Maps

This report gives an introduction to diffusion maps, some of their underlying theory, as well as their applications in spectral clustering. First, the shortcomings of linear methods such as PCA are shown to motivate the use of graph-based methods. We then explain Locally Linear Embedding [9], Isomap [11] and Laplacian eigenmaps [1], before we give details on diffusion maps and anisotropic diffu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009